Goto

Collaborating Authors

 medical entity


EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis

Liao, Yusheng, Wu, Chaoyi, Liu, Junwei, Jiang, Shuyang, Qiu, Pengcheng, Wang, Haowen, Yue, Yun, Zhen, Shuai, Wang, Jian, Fan, Qianrui, Gu, Jinjie, Zhang, Ya, Wang, Yanfeng, Wang, Yu, Xie, Weidi

arXiv.org Artificial Intelligence

Electronic Health Records (EHRs) contain rich yet complex information, and their automated analysis is critical for clinical decision-making. Despite recent advances of large language models (LLMs) in clinical workflows, their ability to analyze EHRs remains limited due to narrow task coverage and lack of EHR-oriented reasoning capabilities. This paper aims to bridge the gap, specifically, we present EHR-Ins, a large-scale, comprehensive EHR reasoning instruction dataset, comprising 300k high-quality reasoning cases and 4M non-reasoning cases across 42 distinct EHR tasks. Its core innovation is a thinking-graph-driven framework that enables to generate high-quality reasoning data at scale. Based on it, we develop EHR-R1, a series of reasoning-enhanced LLMs with up to 72B parameters tailored for EHR analysis. Through a multi-stage training paradigm, including domain adaptation, reasoning enhancement, and reinforcement learning, EHR-R1 systematically acquires domain knowledge and diverse reasoning capabilities, enabling accurate and robust EHR analysis. Lastly, we introduce EHR-Bench, a new benchmark curated from MIMIC-IV, spanning 42 tasks, to comprehensively assess reasoning and prediction across EHR scenarios. In experiments, we show that the resulting EHR-R1 consistently outperforms state-of-the-art commercial and open-source LLMs (including DeepSeek-V3 and GPT-4o), surpassing GPT-4o by over 30 points on MIMIC-Bench and achieving a 10\% higher zero-shot AUROC on EHRSHOT. Collectively, EHR-Ins, EHR-R1, and EHR-Bench have significantly advanced the development for more reliable and clinically relevant EHR analysis.


Utilizing Large Language Models for Zero-Shot Medical Ontology Extension from Clinical Notes

Wu, Guanchen, Xie, Yuzhang, Wu, Huanwei, He, Zhe, Shao, Hui, Hu, Xiao, Yang, Carl

arXiv.org Artificial Intelligence

Integrating novel medical concepts and relationships into existing ontologies can significantly enhance their coverage and utility for both biomedical research and clinical applications. Clinical notes, as unstructured documents rich with detailed patient observations, offer valuable context-specific insights and represent a promising yet underutilized source for ontology extension. Despite this potential, directly leveraging clinical notes for ontology extension remains largely unexplored. To address this gap, we propose CLOZE, a novel framework that uses large language models (LLMs) to automatically extract medical entities from clinical notes and integrate them into hierarchical medical ontologies. By capitalizing on the strong language understanding and extensive biomedical knowledge of pre-trained LLMs, CLOZE effectively identifies disease-related concepts and captures complex hierarchical relationships. The zero-shot framework requires no additional training or labeled data, making it a cost-efficient solution. Furthermore, CLOZE ensures patient privacy through automated removal of protected health information (PHI). Experimental results demonstrate that CLOZE provides an accurate, scalable, and privacy-preserving ontology extension framework, with strong potential to support a wide range of downstream applications in biomedical research and clinical informatics.


MediRound: Multi-Round Entity-Level Reasoning Segmentation in Medical Images

Tong, Qinyue, Lu, Ziqian, Liu, Jun, Zuo, Rui, Lu, Zheming

arXiv.org Artificial Intelligence

Despite the progress in medical image segmentation, most existing methods remain task-specific and lack interactiv-ity. Although recent text-prompt-based segmentation approaches enhance user-driven and reasoning-based segmentation, they remain confined to single-round dialogues and fail to perform multi-round reasoning. In this work, we introduce Multi-Round Entity-Level Medical Reasoning Segmentation (MEMR-Seg), a new task that requires generating segmentation masks through multi-round queries with entity-level reasoning. T o support this task, we construct MR-MedSeg, a large-scale dataset of 177K multi-round medical segmentation dialogues, featuring entity-based reasoning across rounds. Furthermore, we propose MediRound, an effective baseline model designed for multi-round medical reasoning segmentation. T o mitigate the inherent error propagation in the chain-like pipeline of multi-round segmentation, we introduce a lightweight yet effective Judgment & Correction Mechanism during model inference. Experimental results demonstrate that our method effectively tackles the MEMR-Seg task, surpassing conventional medical referring segmentation approaches. The project is available at https://github.com/ 1


Enhancing Medical Dialogue Generation through Knowledge Refinement and Dynamic Prompt Adjustment

Sun, Hongda, Peng, Jiaren, Yang, Wenzhong, He, Liang, Du, Bo, Yan, Rui

arXiv.org Artificial Intelligence

Medical dialogue systems (MDS) have emerged as crucial online platforms for enabling multi-turn, context-aware conversations with patients. However, existing MDS often struggle to (1) identify relevant medical knowledge and (2) generate personalized, medically accurate responses. To address these challenges, we propose MedRef, a novel MDS that incorporates knowledge refining and dynamic prompt adjustment. First, we employ a knowledge refining mechanism to filter out irrelevant medical data, improving predictions of critical medical entities in responses. Additionally, we design a comprehensive prompt structure that incorporates historical details and evident details. To enable real-time adaptability to diverse patient conditions, we implement two key modules, Triplet Filter and Demo Selector, providing appropriate knowledge and demonstrations equipped in the system prompt. Extensive experiments on MedDG and KaMed benchmarks show that MedRef outperforms state-of-the-art baselines in both generation quality and medical entity accuracy, underscoring its effectiveness and reliability for real-world healthcare applications.


Semantic Textual Similarity Assessment in Chest X-ray Reports Using a Domain-Specific Cosine-Based Metric

Picha, Sayeh Gholipour, Chanti, Dawood Al, Caplier, Alice

arXiv.org Artificial Intelligence

Medical language processing and deep learning techniques have emerged as critical tools for improving healthcare, particularly in the analysis of medical imaging and medical text data. These multimodal data fusion techniques help to improve the interpretation of medical imaging and lead to increased diagnostic accuracy, informed clinical decisions, and improved patient outcomes. The success of these models relies on the ability to extract and consolidate semantic information from clinical text. This paper addresses the need for more robust methods to evaluate the semantic content of medical reports. Conventional natural language processing approaches and metrics are initially designed for considering the semantic context in the natural language domain and machine translation, often failing to capture the complex semantic meanings inherent in medical content. In this study, we introduce a novel approach designed specifically for assessing the semantic similarity between generated medical reports and the ground truth. Our approach is validated, demonstrating its efficiency in assessing domain-specific semantic similarity within medical contexts. By applying our metric to state-of-the-art Chest X-ray report generation models, we obtain results that not only align with conventional metrics but also provide more contextually meaningful scores in the considered medical domain.


MEPNet: Medical Entity-balanced Prompting Network for Brain CT Report Generation

Zhang, Xiaodan, Shi, Yanzhao, Ji, Junzhong, Zheng, Chengxin, Qu, Liangqiong

arXiv.org Artificial Intelligence

The automatic generation of brain CT reports has gained widespread attention, given its potential to assist radiologists in diagnosing cranial diseases. However, brain CT scans involve extensive medical entities, such as diverse anatomy regions and lesions, exhibiting highly inconsistent spatial patterns in 3D volumetric space. This leads to biased learning of medical entities in existing methods, resulting in repetitiveness and inaccuracy in generated reports. To this end, we propose a Medical Entity-balanced Prompting Network (MEPNet), which harnesses the large language model (LLM) to fairly interpret various entities for accurate brain CT report generation. By introducing the visual embedding and the learning status of medical entities as enriched clues, our method prompts the LLM to balance the learning of diverse entities, thereby enhancing reports with comprehensive findings. First, to extract visual embedding of entities, we propose Knowledge-driven Joint Attention to explore and distill entity patterns using both explicit and implicit medical knowledge. Then, a Learning Status Scorer is designed to evaluate the learning of entity visual embeddings, resulting in unique learning status for individual entities. Finally, these entity visual embeddings and status are elaborately integrated into multi-modal prompts, to guide the text generation of LLM. This process allows LLM to self-adapt the learning process for biased-fitted entities, thereby covering detailed findings in generated reports. We conduct experiments on two brain CT report generation benchmarks, showing the effectiveness in clinical accuracy and text coherence.


MKG-Rank: Enhancing Large Language Models with Knowledge Graph for Multilingual Medical Question Answering

Li, Feiyang, Chen, Yingjian, Liu, Haoran, Yang, Rui, Yuan, Han, Jiang, Yuang, Li, Tianxiao, Taylor, Edison Marrese, Rouhizadeh, Hossein, Iwasawa, Yusuke, Teodoro, Douglas, Matsuo, Yutaka, Li, Irene

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable progress in medical question answering (QA), yet their effectiveness remains predominantly limited to English due to imbalanced multilingual training data and scarce medical resources for low-resource languages. To address this critical language gap in medical QA, we propose Multilingual Knowledge Graph-based Retrieval Ranking (MKG-Rank), a knowledge graph-enhanced framework that enables English-centric LLMs to perform multilingual medical QA. Through a word-level translation mechanism, our framework efficiently integrates comprehensive English-centric medical knowledge graphs into LLM reasoning at a low cost, mitigating cross-lingual semantic distortion and achieving precise medical QA across language barriers. To enhance efficiency, we introduce caching and multi-angle ranking strategies to optimize the retrieval process, significantly reducing response times and prioritizing relevant medical knowledge. Extensive evaluations on multilingual medical QA benchmarks across Chinese, Japanese, Korean, and Swahili demonstrate that MKG-Rank consistently outperforms zero-shot LLMs, achieving maximum 35.03% increase in accuracy, while maintaining an average retrieval time of only 0.0009 seconds.


AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow

Yu, Huizi, Zhou, Jiayan, Li, Lingyao, Chen, Shan, Gallifant, Jack, Shi, Anye, Li, Xiang, Hua, Wenyue, Jin, Mingyu, Chen, Guang, Zhou, Yang, Li, Zhao, Gupte, Trisha, Chen, Ming-Li, Azizi, Zahra, Zhang, Yongfeng, Assimes, Themistocles L., Ma, Xin, Bitterman, Danielle S., Lu, Lin, Fan, Lizhou

arXiv.org Artificial Intelligence

Simulated patient systems play a crucial role in modern medical education and research, providing safe, integrative learning environments and enabling clinical decision-making simulations. Large Language Models (LLM) could advance simulated patient systems by replicating medical conditions and patient-doctor interactions with high fidelity and low cost. However, ensuring the effectiveness and trustworthiness of these systems remains a challenge, as they require a large, diverse, and precise patient knowledgebase, along with a robust and stable knowledge diffusion to users. Here, we developed AIPatient, an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and the Reasoning Retrieval-Augmented Generation (Reasoning RAG) agentic workflow as the generation backbone. AIPatient KG samples data from Electronic Health Records (EHRs) in the Medical Information Mart for Intensive Care (MIMIC)-III database, producing a clinically diverse and relevant cohort of 1,495 patients with high knowledgebase validity (F1 0.89). Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization. This agentic framework reaches an overall accuracy of 94.15% in EHR-based medical Question Answering (QA), outperforming benchmarks that use either no agent or only partial agent integration. Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p>0.1), and stability (ANOVA F-value 0.782, p>0.1). The promising performance of the AIPatient system highlights its potential to support a wide range of applications, including medical education, model evaluation, and system integration.


Generating Synthetic Free-text Medical Records with Low Re-identification Risk using Masked Language Modeling

Belkadi, Samuel, Ren, Libo, Micheletti, Nicolo, Han, Lifeng, Nenadic, Goran

arXiv.org Artificial Intelligence

In this paper, we present a system that generates synthetic free-text medical records, such as discharge summaries, admission notes and doctor correspondences, using Masked Language Modeling (MLM). Our system is designed to preserve the critical information of the records while introducing significant diversity and minimizing re-identification risk. The system incorporates a de-identification component that uses Philter to mask Protected Health Information (PHI), followed by a Medical Entity Recognition (NER) model to retain key medical information. We explore various masking ratios and mask-filling techniques to balance the trade-off between diversity and fidelity in the synthetic outputs without affecting overall readability. Our results demonstrate that the system can produce high-quality synthetic data with significant diversity while achieving a HIPAA-compliant PHI recall rate of 0.96 and a low re-identification risk of 0.035. Furthermore, downstream evaluations using a NER task reveal that the synthetic data can be effectively used to train models with performance comparable to those trained on real data. The flexibility of the system allows it to be adapted for specific use cases, making it a valuable tool for privacy-preserving data generation in medical research and healthcare applications.


Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds

WU, Jiageng, Wu, Xian, Yang, Jie

arXiv.org Artificial Intelligence

Clinical reasoning refers to the cognitive process that physicians employ in evaluating and managing patients. This process typically involves suggesting necessary examinations, diagnosing patients' diseases, and deciding on appropriate therapies, etc. Accurate clinical reasoning requires extensive medical knowledge and rich clinical experience, setting a high bar for physicians. This is particularly challenging in developing countries due to the overwhelming number of patients and limited physician resources, contributing significantly to global health inequity and necessitating automated clinical reasoning approaches. Recently, the emergence of large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated their potential in clinical reasoning. However, these LLMs are prone to hallucination problems, and the reasoning process of LLMs may not align with the clinical decision path of physicians. In this study, we introduce a novel framework, In-Context Padding (ICP), designed to enhance LLMs with medical knowledge. Specifically, we infer critical clinical reasoning elements (referred to as knowledge seeds) and use these as anchors to guide the generation process of LLMs. Experiments on two clinical question datasets demonstrate that ICP significantly improves the clinical reasoning ability of LLMs.